{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inference and Validation\n",
    "\n",
    "Now that you have a trained network, you can use it for making predictions. This is typically called **inference**, a term borrowed from statistics. However, neural networks have a tendency to perform *too well* on the training data and aren't able to generalize to data that hasn't been seen before. This is called **overfitting** and it impairs inference performance. To test for overfitting while training, we measure the performance on data not in the training set called the **validation** dataset. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training. In this notebook, I'll show you how to do this in PyTorch. \n",
    "\n",
    "First off, I'll implement my own feedforward network for the exercise you worked on in part 4 using the Fashion-MNIST dataset.\n",
    "\n",
    "As usual, let's start by loading the dataset through torchvision. You'll learn more about torchvision and loading data in a later part."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import time\n",
    "\n",
    "import torch\n",
    "from torch import nn\n",
    "from torch import optim\n",
    "import torch.nn.functional as F\n",
    "from torchvision import datasets, transforms\n",
    "\n",
    "import helper"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a transform to normalize the data\n",
    "transform = transforms.Compose([transforms.ToTensor(),\n",
    "                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n",
    "# Download and load the training data\n",
    "trainset = datasets.FashionMNIST('F_MNIST_data/', download=True, train=True, transform=transform)\n",
    "trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)\n",
    "\n",
    "# Download and load the test data\n",
    "testset = datasets.FashionMNIST('F_MNIST_data/', download=True, train=False, transform=transform)\n",
    "testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building the network\n",
    "\n",
    "As with MNIST, each image in Fashion-MNIST is 28x28 which is a total of 784 pixels, and there are 10 classes. I'm going to get a bit more advanced here, I want to be able to build a network with an arbitrary number of hidden layers. That is, I want to pass in a parameter like `hidden_layers = [512, 256, 128]` and the network is contructed with three hidden layers have 512, 256, and 128 units respectively. To do this, I'll use `nn.ModuleList` to allow for an arbitrary number of hidden layers. Using `nn.ModuleList` works pretty much the same as a normal Python list, except that it registers each hidden layer `Linear` module properly so the model is aware of the layers.\n",
    "\n",
    "The issue here is I need a way to define each `nn.Linear` module with the appropriate layer sizes. Since each `nn.Linear` operation needs an input size and an output size, I need something that looks like this:\n",
    "\n",
    "```python\n",
    "# Create ModuleList and add input layer\n",
    "hidden_layers = nn.ModuleList([nn.Linear(input_size, hidden_layers[0])])\n",
    "# Add hidden layers to the ModuleList\n",
    "hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])\n",
    "```\n",
    "\n",
    "Getting these pairs of input and output sizes can be done with a handy trick using `zip`.\n",
    "\n",
    "```python\n",
    "hidden_layers = [512, 256, 128, 64]\n",
    "layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])\n",
    "for each in layer_sizes:\n",
    "    print(each)\n",
    "\n",
    ">> (512, 256)\n",
    ">> (256, 128)\n",
    ">> (128, 64)\n",
    "```\n",
    "\n",
    "I also have the `forward` method returning the log-softmax for the output. Since softmax is a probability distibution over the classes, the log-softmax is a log probability which comes with a [lot of benefits](https://en.wikipedia.org/wiki/Log_probability). Using the log probability, computations are often faster and more accurate. To get the class probabilities later, I'll need to take the exponential (`torch.exp`) of the output. Algebra refresher... the exponential function is the inverse of the log function:\n",
    "\n",
    "$$ \\large{e^{\\ln{x}} = x }$$\n",
    "\n",
    "We can include dropout in our network with [`nn.Dropout`](http://pytorch.org/docs/master/nn.html#dropout). This works similar to other modules such as `nn.Linear`. It also takes the dropout probability as an input which we can pass as an input to the network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Network(nn.Module):\n",
    "    def __init__(self, input_size, output_size, hidden_layers, drop_p=0.5):\n",
    "        ''' Builds a feedforward network with arbitrary hidden layers.\n",
    "        \n",
    "            Arguments\n",
    "            ---------\n",
    "            input_size: integer, size of the input\n",
    "            output_size: integer, size of the output layer\n",
    "            hidden_layers: list of integers, the sizes of the hidden layers\n",
    "            drop_p: float between 0 and 1, dropout probability\n",
    "        '''\n",
    "        super().__init__()\n",
    "        # Add the first layer, input to a hidden layer\n",
    "        self.hidden_layers = nn.ModuleList([nn.Linear(input_size, hidden_layers[0])])\n",
    "        \n",
    "        # Add a variable number of more hidden layers\n",
    "        layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])\n",
    "        self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])\n",
    "        \n",
    "        self.output = nn.Linear(hidden_layers[-1], output_size)\n",
    "        \n",
    "        self.dropout = nn.Dropout(p=drop_p)\n",
    "        \n",
    "    def forward(self, x):\n",
    "        ''' Forward pass through the network, returns the output logits '''\n",
    "        \n",
    "        # Forward through each layer in `hidden_layers`, with ReLU activation and dropout\n",
    "        for linear in self.hidden_layers:\n",
    "            x = F.relu(linear(x))\n",
    "            x = self.dropout(x)\n",
    "        \n",
    "        x = self.output(x)\n",
    "        \n",
    "        return F.log_softmax(x, dim=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train the network\n",
    "\n",
    "Since the model's forward method returns the log-softmax, I used the [negative log loss](http://pytorch.org/docs/master/nn.html#nllloss) as my criterion, `nn.NLLLoss()`. I also chose to use the [Adam optimizer](http://pytorch.org/docs/master/optim.html#torch.optim.Adam). This is a variant of stochastic gradient descent which includes momentum and in general trains faster than your basic SGD.\n",
    "\n",
    "I've also included a block to measure the validation loss and accuracy. Since I'm using dropout in the network, I need to turn it off during inference. Otherwise, the network will appear to perform poorly because many of the connections are turned off. PyTorch allows you to set a model in \"training\" or \"evaluation\" modes with `model.train()` and `model.eval()`, respectively. In training mode, dropout is turned on, while in evaluation mode, dropout is turned off. This effects other modules as well that should be on during training but off during inference.\n",
    "\n",
    "The validation code consists of a forward pass through the validation set (also split into batches). With the log-softmax output, I calculate the loss on the validation set, as well as the prediction accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the network, define the criterion and optimizer\n",
    "model = Network(784, 10, [516, 256], drop_p=0.5)\n",
    "criterion = nn.NLLLoss()\n",
    "optimizer = optim.Adam(model.parameters(), lr=0.001)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Implement a function for the validation pass\n",
    "def validation(model, testloader, criterion):\n",
    "    test_loss = 0\n",
    "    accuracy = 0\n",
    "    for images, labels in testloader:\n",
    "\n",
    "        images.resize_(images.shape[0], 784)\n",
    "\n",
    "        output = model.forward(images)\n",
    "        test_loss += criterion(output, labels).item()\n",
    "\n",
    "        ps = torch.exp(output)\n",
    "        equality = (labels.data == ps.max(dim=1)[1])\n",
    "        accuracy += equality.type(torch.FloatTensor).mean()\n",
    "    \n",
    "    return test_loss, accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "epochs = 2\n",
    "steps = 0\n",
    "running_loss = 0\n",
    "print_every = 40\n",
    "for e in range(epochs):\n",
    "    model.train()\n",
    "    for images, labels in trainloader:\n",
    "        steps += 1\n",
    "        \n",
    "        # Flatten images into a 784 long vector\n",
    "        images.resize_(images.size()[0], 784)\n",
    "        \n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        output = model.forward(images)\n",
    "        loss = criterion(output, labels)\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "        \n",
    "        running_loss += loss.item()\n",
    "        \n",
    "        if steps % print_every == 0:\n",
    "            # Make sure network is in eval mode for inference\n",
    "            model.eval()\n",
    "            \n",
    "            # Turn off gradients for validation, saves memory and computations\n",
    "            with torch.no_grad():\n",
    "                test_loss, accuracy = validation(model, testloader, criterion)\n",
    "                \n",
    "            print(\"Epoch: {}/{}.. \".format(e+1, epochs),\n",
    "                  \"Training Loss: {:.3f}.. \".format(running_loss/print_every),\n",
    "                  \"Test Loss: {:.3f}.. \".format(test_loss/len(testloader)),\n",
    "                  \"Test Accuracy: {:.3f}\".format(accuracy/len(testloader)))\n",
    "            \n",
    "            running_loss = 0\n",
    "            \n",
    "            # Make sure training is back on\n",
    "            model.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inference\n",
    "\n",
    "Now that the model is trained, we can use it for inference. We've done this before, but now we need to remember to set the model in inference mode with `model.eval()`. You'll also want to turn off autograd with the `torch.no_grad()` context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test out your network!\n",
    "\n",
    "model.eval()\n",
    "\n",
    "dataiter = iter(testloader)\n",
    "images, labels = dataiter.next()\n",
    "img = images[0]\n",
    "# Convert 2D image to 1D vector\n",
    "img = img.view(1, 784)\n",
    "\n",
    "# Calculate the class probabilities (softmax) for img\n",
    "with torch.no_grad():\n",
    "    output = model.forward(img)\n",
    "\n",
    "ps = torch.exp(output)\n",
    "\n",
    "# Plot the image and probabilities\n",
    "helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Up!\n",
    "\n",
    "In the next part, I'll show you how to save your trained models. In general, you won't want to train a model everytime you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
